-
Notifications
You must be signed in to change notification settings - Fork 29
add qwen3-next #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add qwen3-next #83
Conversation
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Deploying vllm-blog-source with
|
Latest commit: |
3952ece
|
Status: | ✅ Deploy successful! |
Preview URL: | https://df2ea9e0.vllm-blog-source.pages.dev |
Branch Preview URL: | https://qwen3.vllm-blog-source.pages.dev |
Signed-off-by: heheda <zhangch99@outlook.com>
Signed-off-by: heheda <zhangch99@outlook.com>
Signed-off-by: heheda <zhangch99@outlook.com>
* Further kernel optimizations for GatedDeltaNet layers. | ||
* Better memory management and prefix caching for hybrid models. | ||
* Continuous throughput and CPU overhead reductions. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we specifically call out automatic prefix caching (which is a prerequisite for P/D disaggregation)?
Could ref this WiP PR vllm-project/vllm#23941
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added P/D. But prefer not to ref ongoing PRs as the author may close this one and open another PR.
Signed-off-by: heheda <zhangch99@outlook.com>
No description provided.